skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Tompkin, James"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Large models have shown generalization across datasets for many low-level vision tasks, like depth estimation, but no such general models exist for scene flow. Even though scene flow prediction has wide potential, its practical use is limited because of the lack of generalization of current predictive models. We identify three key challenges and propose solutions for each. First, we create a method that jointly estimates geometry and motion for accurate prediction. Second, we alleviate scene flow data scarcity with a data recipe that affords us 1M annotated training samples across diverse synthetic scenes. Third, we evaluate different parameterizations for scene flow prediction and adopt a natural and effective parameterization. Our model outperforms existing methods as well as baselines built on large-scale models in terms of 3D end-point error, and shows zero-shot generalization to the casually captured videos from DAVIS and the robotic manipulation scenes from RoboTAP. Overall, our approach makes scene flow prediction more practical in-the-wild. Website: https://research.nvidia.com/labs/lpr/zero msf/ 
    more » « less
    Free, publicly-accessible full text available June 11, 2026
  2. We present a method to reconstruct dynamic scenes from monocular continuous-wave time-of-flight (C-ToF) cameras using raw sensor samples that achieves similar or better accuracy than neural volumetric approaches and is 100×faster. Quickly achieving high-fidelity dynamic 3D reconstruction from a single viewpoint is a significant challenge in computer vision. In C-ToF radiance field reconstruction, the property of interest—depth—is not directly measured, causing an additional challenge. This problem has a large and underappreciated impact upon the optimization when using a fast primitive-based scene representation like 3D Gaussian splatting, which is commonly used with multi-view data to produce satisfactory results and is brittle in its optimization otherwise. We incorporate two heuristics into the optimization to improve the accuracy of scene geometry represented by Gaussians. Experimental results show that our approach produces accurate reconstructions under constrained C-ToF sensing conditions, including for fast motions like swinging baseball bats. https://visual.cs.brown.edu/gftorf 
    more » « less
    Free, publicly-accessible full text available June 11, 2026
  3. Free, publicly-accessible full text available December 9, 2025
  4. Free, publicly-accessible full text available December 3, 2025
  5. Free, publicly-accessible full text available October 31, 2025
  6. Augmented reality (AR) area labels can visualize real world regions with arbitrary boundaries and show invisible objects or features. But environment conditions such as lighting and clutter can decrease fixed or passive label visibility, and labels that have high opacity levels can occlude crucial details in the environment. We design and evaluate active AR area label visualization modes to enhance visibility across real-life environments, while still retaining environment details within the label. For this, we define a distant characteristic color from the environment in perceptual CIELAB space, then introduce spatial variations among label pixel colors based on the underlying environment variation. In a user study with 18 participants, we found that our active label visualization modes can be comparable in visibility to a fixed green baseline by Gabbard et al., and can outperform it with added spatial variation in cluttered environments, across varying levels of lighting (e.g., nighttime), and in environments with colors similar to the fixed baseline color. 
    more » « less